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1. INTRODUCTION 

Tracking applications assist users in detecting their location and aid in navigating to destinations [1]. 
According to a recent study performed on mobile phone users who use navigation, approximately 95% of 
users who own a smartphone device used a mapping application at least once, which means that maps are in 
daily use for most smartphone users [2]. Guiding users to particular locations in indoor environments is a 
tough, challenging task. Recently, the systems for indoor areas are greatly developed. To take advantage of 
indoor tracking technology in various places and fields, it must be quick to determine the current location and 
destination. Indoor tracking must be and reliable in reaching the specified goal, allow it to expand according 
to the development of the place, and be independent according to the area used in it [3]. 

It is still simple to get lost indoors, where the global positioning system (GPS) satellite signals are 
not precisely detectable for navigation applications. GPS will present the identical position; however the 
person is on various floors [4]. The systems in this field have been enhanced and became more accurate and 
can now determine users in real-time [5]. People spend most of their time indoors [6]. These improvements 
have allowed leveraging location systems in several fields. Guidance systems studies can found applied in 
monitoring such in faculties, museums, and art galleries [7], [8], medicine [9]-[11], robots [12], [13], 
education [14], and navigation [15]. 
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These days, the world uses hybrid technologies to use indoor and outdoor tracking in a single 
system. Some recent researchers are trying to eliminate reliance on predefined maps for indoor tracking and 
find alternative solutions. In addition to solving the problem of low lighting intensity and application in 
places with multiple floors [16]. 

In this paper, we discuss indoor tracking limitations and challenges. Limitations include light 
intensity, environment complexity, and multiple floors. Algorithms used in outdoor tracking are different 
from indoor tracking. Most indoor tracking depends on predefined maps leads to more costs in map-making. 
The paper also introduces comparative studies between various communication technologies and image 
detection algorithms used in tracking systems. 


2. METHOD 

In this survey, we restrict the searches to include studies that use various ways for building indoor 
tracking systems. We are interested in including the studies that used various positioning technologies. Also, 
we included studies that used different feature detection algorithms. We selected these studies of diverse 
settings to solve the problem of getting lost indoors, where GPS satellite signals are not precisely detectable 
for indoor navigation systems. A broad literature search of IEEE Xplore, Scopus, Science Direct, Egyptian 
Knowledge Bank (EKB) and Google Scholar. We include all the retrieved studies that combined augmented 
reality, positioning techniques, and image processing algorithms. In addition, the reference lists of all 
retrieved papers were reviewed to determine other relevant articles. The studies in this survey are organized 
into the following three groups: active tracking [17]-[19], passive tracking [20]-[22], and hybrid tracking 
[5], [23], [24]. 

The taxonomy of indoor tracking systems denotes the localization of persons and objects within 
buildings. This indoor localization is thus a technical challenge because GPS does not work reliably within 
interior spaces [25]. Most indoor tracking systems use communication technologies or images based on 
detection algorithms. Figure | shows the techniques used and algorithms in each method. 
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Figure 1. Indoor tracking systems taxonomy 


The communications positioning technologies used in our smartphone, known as a GPS or global 
navigation satellite system (GNSS) [26], do not work effectively inside the home or in the areas covered by 
trees or surrounded by concrete buildings. Presently, to achieve indoor positioning are using wireless 
technologies such as wireless local area network (WLAN) [27], Bluetooth [28], near field communication 
(NFC) [29], Ultrasound [30], Infrared [31], and radio frequency identification (RFID) [32]. A comparison 
among the communication technologies is shown in Table 1. 

Each indoor position technology has its own parameters such as in Table 1. So, the technology's 
selection will finally be based on accuracy, cost, precision, ease of handling, scalability, and Power 


Survey of indoor tracking systems using augmented reality (Ashraf Saad Shewail) 


404 0 ISSN: 2252-8938 


consumption. For indoor positioning, the best solution is the fusion of geographic information systems (GIS) 
features and Wi-Fi technologies to get the best of both worlds [18]. 


Table 1. Comparison between various communication technologies 


Technology Coverage Power consumption Accuracy Cost 
GPS Outdoor Very High 6-10 m High 
Wi-Fi (outdoor/indoor) High 1S m Low 
Bluetooth Indoor Low 2-5 m High 
RFID Indoor Low 1-2m Low 
Ultra-wide band (UBW) (outdoor/indoor) Low 5-30 cm High 
Infrared Indoor Low 1-2m Medium 
ZigBee Indoor Low 3-5 m Low 
Cellular (outdoor/indoor) Low 50m-150m High 
Visible light communication (VLC) Indoor Low 4-10 cm Low 
NFC Indoor Low 4cm Low 
Frequency modulation (FM) Indoor Low 24m Low 
Ultrasound Indoor Low 3cm—lm Medium 


Also, we can use image-based detection algorithms to apply indoor tracking systems. The initial 
processing operation in computer vision is feature detection that extracts the interest points needed for the 
next processing steps [33]. An interesting point in an image should be pure and well-built under disturbances 
in the image region [34]. The scale-invariant feature transform (SIFT) is the most used image feature 
extractor. The SIFT technique is scale-invariant and rotation-invariant. Then it tests each pixel in the image 
with its eight neighbors and nine pixels in the scale around it [35]. The SIFT algorithm was slow, and 
advanced applications required a faster version. The speeded-up robust features (SURF) algorithm is 
dependent on the principles as the SIFT, but with some approximations to execute the method much faster 
[36]. Similar to the SIFT, this technique is scale-invariant and rotation-invariant. The features from the 
accelerated segment test (FAST) algorithm have a great advantage in that it is faster than many other popular 
image detection extractor methods. In the FAST method, if a pixel is significantly distinct from neighboring 
pixels, then this pixel is a corner point [37]. To verify the performance of the feature extraction module, 
Feature points of indoor environment images for FAST, SURF, SIFT, and FAST-SURF are tested, 
respectively. As the results are shown in Table 2 [23]. The oriented FAST and rotated BRIEF (ORB) detector 
[11] combines FAST keypoint detectors with specified binary robust independent elementary features 
(BRIEF) descriptions. It is a no-cost alternative to SIFT and SURF that outperforms them in computation 
time and performance. 


Table 2. Comparison between feature detection algorithms 


Algorithm Number of feature extraction Time of feature extraction (ms) __ Real time 
FAST 221 20 No 
SURF 21 22 No 
SIFT 7 20 Yes 
FAST-SURF 26 20 Yes 
ORB-SLAM 77 20 Yes 


The ORB-simultaneous localization and mapping (ORB-SLAM) SLAM algorithm can better suit 
the needs of mobile augmented reality (AR) systems, such as character detection speed, rotation invariance, 
and radiation invariance; they can apply it in real-time. As shown in Table 2, each of the FAST, SURF and 
FAST-SURE algorithms sometimes has the best feature detection numbers but can't be applied in real-time. 
The ORB-SLAM algorithm is very strong robust and can be applied in real-time. 

During the past ten years, multiple researchers sought to apply AR technology to the tour guidance 
system to increase users’ motivation and knowledge. The GPS couldn't be applied to indoor buildings; an 
alternative method to apply indoor guidance should be researched. Researchers have tended to use different 
techniques to create an indoor tracking system. In this paper, most methods are divided into three categories: 
passive tracking, active tracking, and hybrid tracking [38]. 


2.1. Passive tracking 


At the beginning of solving indoor tracking problems using passive tracking techniques [20], built a 
system based on laptops and universal serial bus (USB) webcams that capture the live view frames. This 
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paper uses ARtoolkit to detect a marker and compare it with trained markers saved within the database as 
binary data. When the marker at one accurate location is detected, it will be transformed into the location ID 
for processing. The route planner module calculates the link between the current location and the target. 
Open graphics library (OpenGL) application programming interface (API) is applied to load the virtual 
reality modeling language (VRML) model based on the camera’s coordinates through the marker. Then an 
evolution in the research field to improve indoor tracking system occurrence. The enhanced system 
comprises a web camera, Raspberry Pi display glasses, and an input gadget. All devices associated with the 
Raspberry Pi have different capacities. The client would begin executing the program by entering the goal 
point, after which the camcorder would begin execution for catching live pictures for identifying area 
markers [21]. After area markers are distinguished and perceived, the present area marker data is taken care 
of into the route planner algorithm to decide the virtual item shown on the marker, which depends on the 
bearing to be taken to the following marker area or last goal. According to Yadav et al. [22], introduce 
technological development, increasing ease of use and reducing cost. In this paper, a laptop or a Raspberry Pi 
has been replaced by a smartphone with a rear camera. Figure 2 shows the proposed system in [39]. 
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Figure 2. Indoor tracking system workflow 


Due to the difficulty of use and the high cost of hardware devices, most researchers have turned to 
mobile phone image detection. Most of the research in this category relies on pre-defined maps for indoor 
tracking. The system is based on marker-based tracking and mapping techniques. SLAM algorithm provides 
computer graphics (signs and directions) to map and renew two dimensions plan simultaneously to the 
camera scene [40]. In Al Delail et al. [41], they merge augmented reality with the SLAM algorithm. The AR 
layer notifies the user of a nearby point of interest by image marker recognition. Overloading self- 
explanatory three dimensions virtual objects associated with the location on the real-time video capture 
provided by the Vuforia software development kit (SDK) it is portable and entirely configurable. It allows 
object data to be downloadable from the cloud. 

In the same year, another researcher [42], concluded that fully detailed two-dimensional (2D) or 3D 
maps were unnecessary enough use of an accurate indoor positioning method that uses fiducial markers 
system. This system uses continuous localization to tell users of their current position at all times. The 
discrete localization is required to discover and detect some markers initially. The user chooses a proper 
target from the menu and is instantly carried to the viewfinder screen. Every time a target is detected 
correctly with the Vuforia SDK in the camera frame, the Dijkstra algorithm [43], recalculates the path and 
gives directions to the user. The parallel tracking and mapping (PTAM) algorithm can create and expand a 
map while following the camera pose in an unknown environment, for augmented reality requires no markers 
pre-made maps [44]. 

ORB-SLAM [45], appeared to expand the versatility of PTAM to environments that are intractable 
for that system. The true objective of a SLAM system is to create a map that can be used to give accurate 
localization in the future. Visual SLAM's objective is to make use of the sensors. ORB-SLAM was designed 
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from scratch with a new monocular SLAM system with some new ideas and algorithms but also 
incorporating excellent works developed in the past few years, such as the loop detection, the optimization 
framework g2o0 [46], and ORB features [47]. ORB-SLAM processes in real-time in various locations, both 
large and small, indoor and outdoor. The system is resistant to severe motion clutter supports broad baseline 
loop closure and relocalization. Utilizes the same properties as other SLAM algorithms: tracking, mapping, 
relocalization, and loop closure. ORB-SLAM outperforms other state-of-the-art monocular SLAM 
techniques. ORB-SLAM3 [48], is the first real-time SLAM framework that supports visual, visual-inertial, 
and multi-map SLAM with monocular, stereo, and red green blue-depth (RGB-D) cameras and lens models 
such as lens models pin-hole and fisheye. ORB-SLAM3 uses Atlas for accurate, smooth map merging, 
location recognition, camera relocalization, and loop closure. The system creates a unique DBoW2 [49], 
keyframe database for relocalization, loop closure, and map merging. ORB SLAMs3 is as robust as the best 
systems available in the literature and significantly more accurate. ORB-SLAM3 point of failure is low- 
texture environments. 

Some AR-based indoor navigation systems enhance users’ spatial learning besides leading them to 
their destinations safely and quickly. To improve positioning accuracy, bayesian estimation and the k-nearest 
neighbors (KNN) algorithm are used [50]. Recently, high-frequency radio frequency identification (HF RFID) 
integrated with kalman filtering and tukey smoothing to improve indoor tracking accuracy [51]. Liu and Meng 
[52], the interface for indoor navigation is designed on HoloLens. The arrows are used to aid orientation. 
Semantic meanings in icons with text can assist as virtual marks and help with spatial learning. There are many 
feature extractors like SURF, gradient location and orientation histogram (GLOH), and SIFT [53], that are most 
fitting for applications such as image recognition, which are used with marker-based applications. After that 
appeared research based on building a system on a modified version of the SURF algorithm [54], that is used to 
extract the real-word features and track objects. Tracking of items handling with the projection (pose) matrix 
was computed from the extracted features by homography techniques. The advanced algorithm calculates the 
center pose and visualizes a 3D model over the various image from the standard data set. It was confirmed to be 
useful and practical in marker-less mobile augmented reality. The advanced algorithm is not applied in a real 
application, only used on the data set [55]. The indoor tracking is used in the archaeological areas to improve 
the tourist experience. MAGIC-EYES guidance system [56], uses augmented reality technology. They are using 
sensors on the mobile phone as a camera and gyroscope. The markers such as plaques, stone tablets, and 
buildings patterns also have the function to identify the images of recognizable objects, the viewing direction of 
tourists, and the geographical location information. The pilot study applied to the traditional guidance system 
and the developed augmented reality guidance system on twelve clients. The test results showed that the 
MAGIC-EYES system is much better than traditional methods. 

Complementing the development process in the indoor tracking HyMoTrack system [57] is 
developed, where the mapping phase was created upon small SLAM maps, including a wide-angle camera. 
The SLAM maps and 2D feature markers have been integrated into a global reference map. The markers also 
have the function of discovering the client's start position if no previous experience is available. While the 
discovered image marker delivers a position, the SLAM thread operates in parallel to find the match on the 
sub-map. The planning algorithm A* is used to calculate a path between two points. The blender is used to 
putting 3D content for augmented reality visualization. During detecting the path, the random sample 
consensus (RANSAC) algorithm is used to remove outlier features that are not essential areas of the 
environment. so, it did not solve a problem when using a SLAM algorithm. The HyMoTrack system which 
depends on a visual hybrid tracking strategy, was enhanced in this research. The enhanced system had a 3D 
model generation algorithm executed, which automatically generates a 3D mesh out of a vectorized 2D floor 
plan. field of view path (FOVPath) technique intends to respond not exclusively to the client's location and 
the target, the performance of visual positioning systems can be adjusted [58]. On the other hand, FOVPath is 
reliant on the view direction and the field of view (FOV) capacities of the preowned gadget. 

Furthermore, the detection algorithm was created to work without any previous knowledge like a 
layer name or even metadata of different objects. Recognized figures are stored to generate a library for 
additional research. Likewise, finishing the task varies significantly among the A* based straight path and 
FOVPath. A median of A* is 33:88 seconds is estimated for achieving the mission through a straight path in 
adverse 23:32 seconds for the FOVPath [59]. 

With enhanced augmented reality technologies like object recognition and computer vision, 
location-based augmented reality becomes more interactive. This application comprises three major part's 
database, search engine, and output engine. Application imports screen captures and client's area data into a 
search engine to match with the database as. Afterwards, the output engine generates the match results, 
including the augmented reality components, 3D model, and fascination data, on the screen of versatile 
devices. The characteristics of the campus attractions are extracted with SIFT and stored in a one- 
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dimensional vector. SIFT utilizes the edge and corner places of a picture to recognize images in various 
circumstances, including rotated or twisted images. 

The output engine lays a 3D model in a particular spot base on a convolutional neural network 
(CNN) model. CNN is used to decide the specific position of the 3D models in each point of attraction. The 
application accomplishes an attraction recognition accuracy rate of 90%. The convolution process takes a lot 
of time, unless we eliminated the training process [60]. As shown in Figure 3. 
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Figure 3. Architecture of output engine 


Recently, the building information modelling (BIM) tracker system [61], based on localization 
performed by matching image flows captured using a camera with a 3D model of the building is used as a 
visual tracking framework. The study's principal hypothesis is that centimeter-level precision in localization 
can be performed without any drift using image information with a 3D building model. The camera's pose is 
determined by the algorithm of Gauss-Newton with least-squares minimization, which frequently minimizes 
the re-projection errors in an M-estimator sample consensus (MSAC) framework. By using the ray-tracing 
algorithm of blender, the virtual view is provided, and noticeable edges, just as their corresponding 3D 
coordinates in the BIM coordinate system, are defined. The canny edge detector is used to identify the edges 
in the image. After that, the 3D points are sampled and back-projected on the image plane. Similarities are 
formed by searching for edges on the image in the straight direction of the back-projected model edges of the 
sampled points. The MSAC estimator is used to eliminating the incorrect 3D to 2D similarities, which can 
affect shadows or reduplicate texture. Although the direct solution is faster, the iterative method provides 
more exact estimates. So, we applied the iterative Gauss-Newton approach. Recently systems used advanced 
feature tracking and augmented reality methods through navigation. Features gain of 3D point cloud 
localization demands a pre-deployment stage, where the indoor environment must be 3D scanned and stored 
as anchors. In a database, these anchors correlated with their corresponding locations and navigation-related 
data superimposed on visual feed through the navigation assistant process. 

Based on the anchors recognized of the camera and sensors feed, the client's current position and 
orientation are determined. After the routes were completely examined, the images were transported to 
ARCore SDK for AR data overlay. The A* algorithm is used to compute the shortest path in the system [3]. 
To continually improve with performance and simplify indoor navigation. The next proposed model aims to 
decrease the use of hardware components and other technologies like artificial intelligence and deep learning, 
in the track of navigation and alternately use cloud with augmented reality. When the application starts, the 
system will progress using the sound message of the user and which should be launched by the destination, 
then the virtual path is loaded using the anchors and the virtual arrow signs will guide the user to his 
destination. The user has to should point his camera of the phone throw the anchors and a sound will erupt 
when the user walks throw the anchors. Furthermore, after passing each anchor the next anchor begins 
erupting sound until the user arrives at his destination [62]. To apply passive tracking, the following 
architecture must be followed as shown in Figure 4. 
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Figure 4. Modular architecture of passive tracking 


2.2. Active tracking 

At the beginning of the researches on this point, RFID positioning [44], augmented reality and 
portable route are integrated for building up a 3D expanded reality versatile route framework. 
Wang et al. [17], hardware devices of the RFID indoor positioning system combine a practical RFID reader 
and tag. The tag used in the RFID positioning method enables the system to obtain information on individual 
locations. The navigation information server method sends the navigation information at the exhibition area 
of visitors to their mobile devices. This system's message-markerless augmented reality technology could be 
combined with the positioning to enhance image recognition performance. 

Zegeye et al. [63], the new trend in indoor tracking is both Wi-Fi and GIS. It looked through 
merging between mobile AR technology and Wi-Fi positioning technology. The researchers strive to develop 
this technology by establishing applying Wi-Fi received signal strength (RSS) in the considered indoor 
condition to construct radio maps utilizing the Wi-Fi fingerprinting approach. Fingerprinting data gives a 
coarse RSS estimation to the entirety of the reachable APs. The radio maps are stored as a file of 7x138 
comma-separated values (CSV) RSS. When a client demands position estimation, an RSS estimation 
gathered on-the-fly will be sent from the android-based gadget's customer android application using an 
attachment to the server as an extensible markup language (XML) record. The received XML file will be 
parsed at the server, and position estimation performed at the server (remote laptop computer) by running the 
localization algorithm. The determined location value will be transfer as an XML file contains the expected 
locations to the client's android application. The predicted location would be displayed for the user by the 
identical application. The accuracy was achieved, and the system can determine the client position. At 67% 
of the time over 42 recognition positions. 

Then complementing the development process and scientific research to solve some problems and 
improve the internal tracking process, geographic information systems, and sensor devices measurements 
appeared. augmented reality engine application (AREA) framework [19], comprises a portable augmented 
reality kernel that empowers location-based mobile augmented reality applications. AREA kernel shall 
consist of three created algorithms, the tracking algorithm, the points of interest (POIs) algorithm, and the 
clustering algorithm. Four specialized issues were vital when building up the kernel. POIs must effectively 
show regardless of the gadget is held at a slant. Show POIs accurately and efficiently should be given to the 
user. The idea of POI is coordinated with basic, versatile working operating system (OS) (iPhone operating 
system (iOS), Android, and Windows Phone)). The kernel provides for dealing with points of interest 
clusters. The idea of AREA to relate a client to the objects recognized in the camera show depends on five 
aspects. A virtual 3D world utilizes to connect the client's location to one of the objects. The client is 
positioned at the origin of this world. Rather than the physical camera, a virtual 3D camera that works with 
the built virtual 3D world. 
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The different sensor features of the upheld mobile operating systems include allowing the virtual 3D 
world. The physical camera of the cell phone changes into the virtual 3D camera depending on the estimation of 
sensor data. Previous research developed and a system built an indoor tracking system to enhance low 
positioning correctness and accuracy. The pedestrian tracking algorithm [18], uses indoor environment 
restrictions within the grid-based indoor model to improve a Wi-Fi-based system's localization. Indoor space is 
partitioned into grid cells that have a specific size and corresponding semantics. The precision of the grid model 
relies upon the grid size. The pedestrian algorithm repeatedly estimates that the location probability with these 
cells depends on the indoor and magnetometer measures toward a mobile cell. Tracking errors as ill-advised 
areas, wrong heading, and jumps among consequent locations are determined using the Wi-Fi positioning 
system, which causes a low dynamic tracking efficiency. To decrease predicting error, the tracking outcomes 
estimate against measurements in each three tracking intervals. The grid filter is a discrete Bayesian filter that 
probabilistically determines a target's position depending on measurements from sensors. Tracking system 
estimate positions over time using the Markov chain model. The advanced tracking algorithm, which depends 
on Wi-Fi positioning technology, can provide location precision at meter level 92% positions within 3.5 m of 
error. To apply active tracking, the following architecture must be as shown in Figure 5. 
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Figure 5. Modular architecture of active tracking 


2.3. Hybrid tracking 

The hybrid indoor augmented reality tracking system integrates the virtual object into the same 
position in real views. The feature extractor algorithms [23], are enhanced by merging FAST with SURF to 
create FAST-SURF that can fit the demand for robustness and real-time performance. To define the mobile 
phone position information achieved by using the fingerprinting technique algorithm. Set a fingerprint 
position recognition database of a particular indoor environment to compare and match each detected access 
point RSS values of detection points with stored records by applying the KNN algorithm and determining the 
position value. 

Build a system [5], that aims to achieve automatic people tracking system that provides mapping 
Wi-Fi networks to determine people's position. The system consists of multi-agent, which enables the control 
of both the trolley and client detection agent's hardware, which is responsible for detecting and calibrating the 
client. The main job of the trolley is to follow the clients during their shopping process. The development of 
scientific research reduces the percentage of error in accessing positions and the emergence of mapping Wi- 
Fi networks. The data on vehicle transfer was adopted to capture signal maps to reduce the need to perform 
manual calibration and, therefore, enhance data updating. A Bayesian network classifier was applied for 
determined The final position using combining data provided by wireless networks. An obstacle detection 
agent eliminates collision during the trolley is moving. HC-SR04 distance sensors were used to detect the 
obstacles. Tablet is further responsible for scanning the Wi-Fi networks and the beacons utilizing a USB 
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adapter of the Edimax brand, EW-7611UL model, including Wi-Fi and Bluetooth. Database organization that 
has a data agent which is in charge of controlling the data in it. 

Wu et al. [24], augmented reality, deep learning and the cloud are used to solve the difficulty of 
GPS that does not work well for indoor navigation. The user is asked to switch on the camera phone to scan 
the surrounding environment to the database. The Python recognition system will take the first data in the 
database and match it with the image recognition module [64]. The images are uploaded to the back-end, 
then it can match with the features of You only look once, version 3 (Yolov3) [65]. The images are processed 
by binarization, contour detection, and cut redundant edges by scaling [66], [67]. The KNN nearest neighbor 
algorithm was used to remove unnecessary features in an earlier trained module for digit recognition 
[68]-[70], and determine the user's initial location positioning [71], [72]. A* search algorithm [73], [74], 
determines the shortest route from the initial point to the destination and suggests it to the users [75], [76]. 
Through indoor navigation, users can use AR electronic bulletin boards at a particular location to present 
relevant information about the location [77]. The common characteristics of most indoor tracking systems are 
shown in Table 3. 


Table 3. Characteristics of most indoor tracking systems 
With low light Work in Use predefined — Use hardware 


Indoor Outdoor Marker Markerless Cloud 


intensity multi -level map requirements 
V V NA V NA V V NA NA 
V NA V NA NA V NA V NA 
V NA NA V V NA V NA V 
V NA NA NA NA NA V NA NA 
V NA V V NA V NA V NA 
V NA NA V V NA V NA NA 
V NA V NA NA NA NA V NA 
V V NA V NA V V NA NA 
V V NA V NA NA V NA NA 
V NA NA V V NA NA NA NA 
V NA NA V V NA NA NA NA 
V NA V NA NA V NA V NA 
V NA NA V NA V NA V V 
V NA NA V V NA NA NA V 
V NA NA V V NA NA NA V 
V NA NA V NA NA NA V V 
3. RESULTS AND DISCUSSION 


The main objective of using indoor tracking systems is to navigate people through complex and 
unfamiliar environments. Tracking systems enable users to reach their desired destination, with minimum 
congestion and time consumption. The adapted techniques in outdoor tracking can not work indoor tracking 
for the extra challenges such as color intensity and GPS absence. The tracking can be divided into active, 
passive, and hybrid tracking. 

Active tracking systems depend on using one or more communication technologies, such as Wi-Fi 
or Bluetooth. To reduce the computational complexity, researchers try to improve the connectivity and the 
localization accuracy. Researchers usually use the weighted centroid algorithm (WCL) to enhance the indoor 
tracking. Active indoor tracking is commonly used for low cost systems. Moreover, it solves the tracking 
problem in low light intensity. The current challenge in active indoor tracking is its low accuracy in complex 
environments with multiple walls and floors, as it affects the signal's strength. 

Passive tracking systems depend on using one or more image detection algorithms. Systems use pre- 
loaded maps to enhance users' navigation. These techniques usually use the route planner algorithm. This 
algorithm calculates the distance between the current location and the target, to reduce the calculation of the 
real-time tracking. Passive indoor tracking systems are usually used to solve the problem of complex 
environments. However, low light intensity can affect passive indoor tracking systems’ accuracy. 

Hybrid tracking systems integrate communication technologies with image detection algorithms in 
one system. The communication technology module calculates the position and orientation of the mobile. 
While, the image detection module extracts and matches feature points, between the current frame and offline 
environment images. The virtual content about route tracking overlay the real-time camera's view. The 
advantages of hybrid indoor tracking systems are to enhance the image based tracking with sensors’ data. 
However, its signal strength is affected in complex environment, as in active tracking. We noticed, the active 
tracking is better to use when the indoor environment is covered by Wi-Fi and use the fingerprinting 
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technique algorithm. Passive tracking is more convenient using when environment maps exist to improve 
tracking accuracy with image detection and computer vision algorithms. 

Usually, hybrid tracking is used when the environment is covered with Wi-Fi and merged with the 
algorithms of image detection and computer vision. Building a hybrid tracking model is more effective and 
accurate. Hybrid tracking works in low-light intensity environments because it depends on Wi-Fi RSS 
fingerprinting to determine the user's location. In addition to making it more effective for the user through 
image detection algorithms to view actual user tracking in a real environment. The Table 4 shows when to 
use the three types of tracking techniques. 


Table 4. Indoor tracking systems features 


Tracking technique Tracking space __ Physical world parameters User perspective 
Active Small space Not affected by the light intensity | Easy to scalability 
Single floor Affected by the occlusion 
Not need predefined map 
System needs extra hardware 
Passive Complex space Affected by the light intensity Hard to scalability 
Multiple floor Not affected by the occlusion 
Need predifiend map 
System not need extra hardware 
Hybrid Complex space Not affected by the light intensity _ Hard to scalability 
Multiple floor Not affected by the occlusion 
Need predefined map 


System needs extra hardware 


As shown in Table 4, when the user environment is small, a single floor prefers to use active 
tracking. When the user environment is complex, multiple floors prefer to use passive tracking. When 
lighting intensity in the environment is slightly or almost non-existent, it is preferable to use active or hybrid 
tracking. The occlusion in complex environments with multiple walls and floors affects the system's accuracy 
in active tracking. The cost of active and hybrid tracking is high due to the need for hardware components. 
The scalability in active tracking is easier than passive and hybrid tracking because it does not require 
predefined maps. 


4. CONCLUSION 

Indoor tracking systems are usually implemented based on communication technologies or image 
detection algorithms. This paper demonstrates various indoor tracking systems for augmented reality 
applications. We compared indoor tracking systems developed based on various communication 
technologies. Similarly, we compared indoor tracking systems developed based on image detection 
algorithms. This paper also provides a comprehensive review of the recent relevant applications from 2011 to 
2021. Developing active tracking is very stable, with widespread use, computationally inexpensive, and it 
provides systems that can integrate with 3D models. Active tracking is usually faulty for applications that 
need accurate tracking and registration. On the other hand, passive tracking can apply more reliable and gives 
more accurate pose estimation and be tracking. Passive tracking is computationally expensive and needs 
powerful hardware. For the hybrid tracking, both sensor-based tracking and vision-based tracking are 
integrated to overcome the limitations of each technique. Recently, hybrid-based methods produce accurate 
tracking that can be run on handheld devices. 
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